

### Xilinx Virtex-5QV (V5QV) Independent SEU Data



Melanie Berg, AS&D Inc. in support of NASA/GSFC

Melanie.D.Berg@NASA.gov

**Kenneth Label: NASA/GSFC** 

Jonathan Pellish: NASA/GSFC



## NASA Goddard Radiation Effects and Analysis Group (REAG) FPGA Testing Supporters and Collaborators

#### Supporters:

- Defense Threat Reduction Agency (DTRA)
- NASA Electronics Parts and Packaging (NEPP) Program

#### Collaborators:

- Xilinx
- GSFC Satellite Servicing Capabilities Office (SSCO)



#### **Acronyms**

- Block random access memory (BRAM)
- Built-in-self-test (BIST)
- Combinatorial logic (CL)
- Configurable Logic Block (CLB)
- Device under test (DUT)
- Digital Clock Manager (DCM)
- Digital Signal Processing Block (DSP)
- Distributed triple modular redundancy (DTMR)
- Dual interlocked storage cell (DICE)
- Edge-triggered flip-flops (DFFs)
- Field programmable gate array (FPGA)
- Global triple modular redundancy (GTMR)
- Input output (I/O)
- Linear energy transfer (LET)
- Local triple modular redundancy (LTMR)
- Look up table (LUT)
- Low cost digital tester (LCDT)
- Mitigated DCM (MITDCM)

- Power on reset (POR)
- Probability of logic masking (P<sub>logic</sub>)
- Radiation Effects and Analysis Group (REAG)
- Single-event effects Immune Reconfigurable FPGA (SIRF)
- Single event functional interrupt (SEFI)
- Single event latchup (SEL)
- Single event transient (SET)
- Single event upset (SEU)
- Single event upset cross-section (σ<sub>SEU</sub>)
- Static random access memory (SRAM)
- System on a chip (SOC)
- Universal Serial Bus (USB)
- Virtex-5QV (V5QV)
- Windowed Shift Register (WSR)

#### Virtex-5QV Investigation Overview



- This is an independent study to determine the single event destructive and transient susceptibility of the Xilinx Virtex-5QV (SIRF) device.
- The DUT is configured to have various test structures that are geared to measure specific potential susceptibilities of the device.
- Design/Device susceptibility is determined by monitoring DUTs for Single Event Transient (SET) and Single Event Upset (SEU) induced faults while exposing them to a heavy ion beam.
- Test strategies are based on the NASA Goddard REAG FPGA SEU Test guidelines manual:
  - https://nepp.nasa.gov/files/23779/fpga\_radiation\_test\_guidelines\_2012.pdf

## Characterizing SEUs: Radiation Testing and SEU Cross Sections



SEU Cross Sections ( $\sigma_{seu}$ ) characterize how many upsets will occur based on the number of ionizing particles to which the device is exposed.

$$\sigma_{seu} = \frac{\#errors}{fluence}$$

#### **Terminology:**

- Flux: Particles/(sec-cm²)
- Fluence: Particles/cm<sup>2</sup>

 $\sigma_{seu}$  is calculated at several LET values (particle spectrum)

Testing with a low flux is imperative with the Xilinx V5QV due to the complexity of the device versus the accelerated rate of exposure.



## Understanding SEU Data and Their Applications to Complex Designs



- Along with providing  $\sigma_{SEU}$  data, aspects of how the data were obtained are discussed, such as:
  - Related test structure(s),
  - Speed of operation, and
  - Reasoning of test strategy.
- A goal of SEU radiation testing is to eventually extrapolate SEU data to critical applications (designs).
- Designs are complex. Without an understanding of how and why data are obtained, extrapolation will be inaccurate and can be detrimental to the success of a mission.



## FPGA Structure Categorization as defined by NASA Goddard REAG $\sigma_{SEU}$ Differentiation:

$$P(fS)_{error} \propto P_{Configuration} + P(fS)_{functionalLogic} + P_{SEFI}$$

$$Configuration \ \sigma_{SEU}$$

$$Sequential \ and \ Combinatorial \ logic \ (CL) \ in \ data \ path$$

$$Global \ Routes \ and \ Hidden \ Logic$$

SEU Testing is required in order to characterize the  $\sigma_{\text{SEU}}$  for each of FPGA categories



#### **V5QV Accelerated SEU Testing**

## **Best Practice for Radiation Test Setups: Functional Control**



 Types of DUT functional input control: clocks, resets, and data inputs.

#### Concerns:

- Synchronizing inputs and managing skew between inputs. Challenging with high frequencies.
- Operating the device in a realistic manner:
  - Do not over-load the device with unrealistic stimulus during radiation testing. If the device is operating in states that would never occur, then radiation data will not be characteristic.
  - Do not under-load the device during radiation testing. If the device is underperforming, this means that a large amount of circuitry is not operating. This produces operational states with a large amount of logic masking; consequently, radiation data will not be characteristic.

## **Best Practice for Radiation Test Setups: Power Control**



 Types of voltage controllers: power supplies and special on-board voltage regulation circuitry.

#### Concerns:

- Device may draw a larger amount of current than originally expected. Cooling apparatus may be necessary during operation.
- Power glitching or Single Event Latch-up (SEL) can cause the system to cease operation or be damaged. Hence it is best practice to separate test vehicle power from DUT power. It is also ideal to have current limiting circuitry for the test vehicle and the DUT.

## **Best Practice for Radiation Test Setups: Monitoring Functional Upsets**



- Compare DUT outputs to expected values. This can be done:
  - Visually (only recommended as a supplement); i.e., watching the error indication on the error detection equipment (e.g., logic analyzer or oscilloscope);
  - Using equipment event triggers; or
  - Custom comparison circuitry.
- Differentiate upset types: e.g., clock tree SET, DFF SEU, CL captured SET, or configuration faults.
- Count SEUs (upset statistics): After the upsets have been detected and differentiated, they need to be counted. The higher the number of upsets, the better the statistics.

### **Best Practice for Radiation Test Setups: Automated Data Capture and Messaging**



- Reliably capture data:
  - Follow synchronous design rules which include how to capture asynchronous signals.
  - Determine minimal sampling frequency (when applicable).
  - Understand the limitations of the automated test equipment with respect to the DUT (e.g., memory (storage) space and speed).
- Once erroneous data are captured, they should be packaged and stored (e.g., sent to a host PC).
  - Timestamp
  - Expected value
  - Received value

#### **Example of a V5QV Error Record**





| Description                                               |
|-----------------------------------------------------------|
| Current captured data (cycle N)                           |
| Previous captured data (cycle N-1)                        |
| Unused: data pattern is always checkerboard               |
| Unused                                                    |
| Cycle counter. Must multiply by the DUT frequency to      |
| convert to time. Used to determine error burst sequences  |
| Indicates type of error record:                           |
| "001" is a timeout – one of the shift clocks not detected |
| "011"Out of timeout – all shift clocks are recovered      |
| "000" Error or non-error — current value does not equal   |
| previous value                                            |
| "010" Debug check – command was sent to check value       |
| settings                                                  |
|                                                           |

## **Best Practice for Radiation Test Setups: Monitoring Power**



- Use of power supply monitors or specialized onboard (tester) circuitry.
- Use of an automated monitor/capture system is beneficial. Provides the ability to perform post processing on power data and to identify particular error signatures.
- As previously mentioned, the ability to automatically power down or limit current if the DUT current gets too high is beneficial.

For accelerated V5QV SEU testing, we used all of the above.

## Test Structure Configuration Mitigation: Scrubbing Specifics



Scrubbing is the act of simultaneously writing into FPGA configuration memory as the device's functional logic area is operating with the intent of correcting configuration memory bit errors.

- Too many upsets in the system (due to accelerated flux) can cause unrealistic behavior... unrealistic σ<sub>SEU</sub>s!
- Can manage the accelerated upset rate by varying flux.
- Make sure scrubbing can keep up with your upset rate.
- During irradiation, our scrub rate for the Xilinx V5QV is once every 100 ms (10 Hz).
- Read-back after a test with scrubbing should have a minimal number of configuration-bit upsets (excluding un-scrubbable bits).

 $\sigma_{\text{SEU}}$ : SEU cross-section

#### **V5QV Test Set-up**







#### **Configuration and BRAM Testing**

## Procedures for Configuration and BRAM Testing



- Basic Configuration and BRAM Static Test:
  - Load FPGA configuration;
  - Irradiate device while the device is in a static state (no scrubbing of configuration memory);
  - Stop radiation beam and read back the configuration;
  - Count configuration and BRAM upsets; and
  - Normalize the upsets by the number of particles of exposure (Configuration and BRAM SEU cross section  $\sigma_{SEU}$ ).
- All tests (regardless of type) include configuration read-back after each beam-run.

## Commercial versus Hardened Configuration Memory Heavy-ion SEU Data

σ<sub>SEU</sub>s of Virtex-5 Configuration Bits: Commercial (V5) versus Hardened (V5QV)



## Commercial versus Hardened BRAM Heavy-ion SEU Data



σ<sub>SEU</sub>s of Virtex-5 BRAM Bits: Commercial (V5) versus Hardened (V5QV)



#### **Investigating SEFIs**

We look for particular error signatures to determine SEFI occurrence:

- Read-back of configuration is mostly logic '0' assume a Power On Reset (POR) glitch.
- Unable to connect to the device to read-back assume problem in the configuration interface.
  - Hidden (to user) state machines
  - Configuration registers
- Global upsets in functional logic not performed during static readback.
  - Reset correction: clock tree or reset tree (global routing)
  - Configuration correction: configuration bit upset not considered a SEFI

#### **V5QV Configuration SEFIs**







Most SEFI error signatures were large areas of the configuration bits forced to '0'. This resembles a power on reset (POR) hit.



# Xilinx V5QV Heavy Ion Accelerated Testing: Functional Data Path (dynamic operation) and Functional SEFIs (i.e., global routes)



## Xilinx V5QV Heavy Ion Accelerated Testing: Test Structure Development

- We start with simple test structures.
- We increase complexity per test structure.
- We study trends.
- We try to make sense out of the convoluted data obtained from complex test structures.

Test Structure Considerations Taken from the NASA Goddard REAG FPGA SEU Test Guidelines:

https://nepp.nasa.gov/files/23779/fpga\_radiation\_test\_guidelin

V5 is a commercial Xilinx filed programmable gate array bevice; V5-QV is a radiation-tolerant device

#### Best Practice for Radiation Testing: Logic Replication for Statistics



### Best-Practice for DUT Test Structure Development

Test structures should contain a large number of replicated logic in order to increase statistics.

#### SEU testing with hundreds of counters versus only one





## Best Practice for Radiation Testing: State Space Traversal



## Best-Practice for DUT Test Structure Development

A test structure's state space should be traversable such that it can be covered within one radiation test run.

#### Otherwise:

- A significant amount of circuitry and system states are not tested.
- The result is SEU data that are uncharacteristic of the design.



#### Best Practice for Radiation Testing: Logic Masking



#### **Best-Practice for DUT Test Structure Development**

Logic masking should be minimized or controllable (i.e., taken into account).

Any logic gate with more than one input will have logic masking except for XOR or XNOR gates.

P<sub>logic</sub> is the probability that an upset will be masked from being captured by the system.

P<sub>logic</sub> = 0 : path is 100% masked

 $P_{logic} = 1$ : path has no masking



## Best Practice for Radiation Testing: Avoiding Unrealistic SEU Accumulation



## Best Practice characteristics of a DUT design

Avoid unrealistic SEU accumulation from accelerated testing:

- Use flush through test structures; e.g., shiftregisters.
- Small number of gates per sub-test structure; e.g., testing hundreds of counters.

SRAM Based FPGAs:
Scrubbing (correcting)
configuration SEUs.
Extremely important during
accelerated testing... must
keep up with the particle flux
to avoid accumulation.



#### Best Practice for Radiation Testing: Increasing Visibility



#### Best Practice characteristics of a DUT design

All (or a significant percentage of) potential upsets should be observable during testing.

Test structures can be designed to enhance observable nodes; e.g., shift-registers, counters, scan rings, internal

logic taps.



If an SEU occurs, will it propagate to I/O before the test is complete?

## Difference between Test Structure and Application Specific Design



- A <u>test structure</u> is a design implemented in a DUT that is created specifically for SEU testing.
- An <u>application-specific design</u> is circuitry implemented in a DUT that is either the final design targeted for space or a subset of the final design.

#### **Use of Test Structures versus Application Specific Designs for** Acquiring σ<sub>SEU</sub> Data

- Although error rates and error responses are design dependent, useful information can be extrapolated from test structures versus application specific designs.
- Why use test structures?
  - By the time the final design is complete, it is usually too late to perform radiation testing on it.
  - It can be too difficult to apply input-stimuli to an application specific design.
  - It can be too difficult to monitor DUT responses of application specific designs.

Test Structures can be constructed to meet SEU-testing

## Additional Challenges using Application Specific Designs for SEU Testing

- Statistics are poor, usually because there is not a significant amount of replication.
- In addition, trends for specific elements are not able to be clearly identified/established.
- The state space of a complex design cannot be traversed within one radiation test run.
- Application-specific designs contain a significantly higher number of masked data paths than test structures.
- It is difficult to control SEU accumulation in an accelerated test environment.

Many best practice considerations are violated.

#### Benefits of Testing Application Specific Designs



- Increase observation error responses specific to the application.
- However, the user must be aware of the following:
  - Unrealistic SEU accumulation in an accelerated environment,
  - Limited visibility due to masking and fractional state space traversal,
  - Poor statistics due to the variance in design circuits, and
  - $-\sigma_{\text{SEU}}$ s will most likely have a large variance if circuits are not able to be isolated and controlled.



## Test Structures used for Dynamic V5QV SEU Testing

| Test Structure                         | Frequency<br>Range | Additional Fault Tolerance |
|----------------------------------------|--------------------|----------------------------|
| Shift registers                        | 2 kHz – 300 MHz    | Yes                        |
| Counters                               | 2 kHz – 150 MHz    | Yes                        |
| Global routes                          | 2 kHz – 150 MHz    | Yes                        |
| MicroBlazeTM                           | 50 MHz             | Yes                        |
| Digital Signal Processors (DSP blocks) | 2 kHz – 150 MHz    | No                         |

#### **Test Structures: Shift Registers**



- Shift registers are great for baselining σ<sub>SEU</sub>s.
- Simple architecture with no masking.
- Large number of stages are easily implemented to achieve good statistics.

| Caveats to traditional shift register SEU testing | NASA Goddard REAG's solution                     |
|---------------------------------------------------|--------------------------------------------------|
| High speed data input synchronization             | Internal data generation                         |
| High speed data output capture                    | Windowed shift registers                         |
| Use of built-in-self-test (BIST) counter for SEUs | With the use of WSRs, no need for a BIST counter |

Test Structures: Windowed Shift Registers (WSRs)







#### **Test Structures: Inserting Mitigation**



- V5QV embedded SEU Filter option: not available in the commercial FPGA device (it is V5QV specific).
- LTMR: user implemented. Do not use in the Virtex commercial family of devices...it is useless. However, it might be an option in the V5QV...see data section.
- DTMR: user implemented.
  - Implemented with and without area constraints
  - Can be used in the commercial device
- GTMR: user implemented.
  - Implemented with and without area constraints
  - Can be used in the commercial device
- Configuration memory scrubbing: user implemented and can be used in the commercial and the V5QV devices.

### Test Structures: V5QV Embedded Single Event Transient (SET) Filters

- NASA
- The V5QV has embedded SET filters placed on the data input and clock input of each DFF.
   Usage is optional.
- Filters are expected to reduce the effects and the capture of SETs.
- Xilinx reports that the SET filters reduce susceptibility.
- NASA Goddard REAG has verified this claim.



SET: Single Event Transient;

DFF: flip-flop

REAG = Radiation Effects and Analysis Group is

part of Code 561 at NASA/GSFC

### Test Structures: Local Triple Modular Redundancy (LTMR)





LTMR is a mitigation strategy that can can only be used in FPGAs with hardened configuration. It cannot be used in the commercial Virtex family of devices



## Test Structures: Distributed Triple Modular Redundancy (DTMR): DFFs + Data Paths All DFFs with Feedback Have Voters

DFF = D flip flop DTMR Minimally Low  $P(f_s)_{error} CP_{configuration} + P(f_s)_{functionalLogic} + P_{SEE}$  Lowered

## Test Structures: Global Triple Modular Redundancy (GTMR):DFFs + Data Paths + Global Routes All DFFs with Feedback Have Voters

DFF = D flip flop



### σ<sub>SEU</sub> DATA: Investigating Frequency Effects with WSRs at 5.7 MeV-cm<sup>2</sup>/mg



#### WSR Strings Ar 0° 5.7 MeV-cm<sup>2</sup>/mg



### σ<sub>SEU</sub> DATA: Investigating Frequency Effects with WSRs at 20.6 MeV-cm<sup>2</sup>/mg



#### WSR Strings Kr 0° 20.6 MeV-cm<sup>2</sup>/mg



#### WSR<sub>0</sub> with respect to LET









#### WSR<sub>8</sub> with respect to LET



WSR<sub>8</sub> 10 kHz



#### WSR<sub>16</sub> with respect to LET





#### **WSR SEU Testing: Conclusions**



- WSR test structures were used to analyze: DFF, SET Filter, frequency effects, and efficacy of various mitigation strategies.
- SEU data illustrate the following:
  - Utilization of SET Filters provide approximately a decade of improvement of DFF SEU susceptibility when not using DCMs.
  - Frequency effects show that DFF SEUs dominate SETs in the functional data path. Hence, the embedded DICE mitigation strategy for the DFFs are not as strong as embedded LTMR
  - Implementing LTMR with filtering does not produce benefits over foregoing LTMR while using the filter option.
  - Implementing DTMR does decrease overall  $\sigma_{SEU}$ s, but at an expensive price for area, power, and timing.



## Test Structures and Heavy Ion SEU Results: No DCM

#### Scrubber always turned on Lowest LET Tested = 1.8 MeV-cm<sup>2</sup>/mg

| Test Structure | Frequency<br>Range | Additional Fault Tolerance |
|----------------|--------------------|----------------------------|
| Counters       | 50 MHz             | No                         |
| Global routes  | 50 MHz             | No                         |
| DCM            | 50 MHz             | Yes                        |

#### **Counter Test Structures**





In order to study global structures, various clocking schemes are connected to all of the counters (and snap-shot array) via a clock tree: input Clock (no DCM) versus DCM.

### Why Counter Arrays versus a String of Counters?



| Counter Array                                                                                                    | String of Counters                                                                                                                                  |
|------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| Upsets are isolated per counter unless the upset is from a global route.                                         | Counters are co-dependent – hence it can be difficult to differentiate between a multiple bit upset, single bit or global route.                    |
| A custom tester can resynchronize with a counter that incurs an SEU. Built-in-self-test (BIST) is not necessary. | Implementing a string of counters is complex arithmetic. It can be difficult to resynchronize with an error consequently BIST is usually necessary. |
| Full state space traversal  String of Counters                                                                   | Usually implemented with simple data patterns due to the complexity – hence state space traversal is extremely limited.                             |



### Differentiating SEUs: Upset types for SEU Analysis



- Global Upsets: σ<sub>SEU</sub>s for a sequence of upsets that last greater than a snap shot cycle ...> 100's μs:
  - Upsets are from the clock or reset and stem from the top of the global routing tree.
  - Could be clock, DCM, or buffer located high in the global routing tree.
  - Error signature is sporadic –and does not resemble a stuck fault as with a configuration bit SEU induced error.
- Burst: σ<sub>SEU</sub>s for a sequence of upsets that occur within a snap shot cycle (<100's μs):</li>
  - Upsets are from low in the global routing tree.
- Single Bit: σ<sub>SEU</sub>s for DFF (bit) flips in a counter.
- Snap Shot: σ<sub>SEU</sub>s for DFF (bit) flips in the snap shot array.

### Comparison of Various Component $\sigma_{SEU}$ s with SET Filter Off





Upsets start to converge at higher LETs. However, DFF upsets are dominant.

### Comparison of Various Component $\sigma_{SEU}$ s with SET Filter On



No DCM: SET Filter ON



Upsets start to converge at higher LETs. DFF upsets are less dominant than with the SET Filter off.

### Counter $\sigma_{SEU}$ s: SET Filter Off versus SET Filter On



No DCM: SET Filter Off versus SET Filter On Full LET Range



SET Filter On only makes a difference at low LET values.

### Counter $\sigma_{SEU}$ s: SET Filter Off versus SET Filter On: Zooming in on Low LET Values

No DCM: SET Filter Off versus SET Filter On - Low LET Range



SET Filter On decreases  $\sigma_{\text{SEU}}$ s approximately 1.5 decades. SET Filter On also increases on-set LET.

### σ<sub>SEU</sub>s in Counter versus Snapshot Register



No DCM: SET Filter Off



Counters have a higher cross section than Snap Shot.
Counters are active every cycle; Snap Shot is only active every 4 cycles. Counters have more complex circuitry than Snap Shot.

### Comparing Global $\sigma_{SEU}$ s SET Filter Off versus SET Filter On



No DCM: Global SEUs: SET Filter On versus SET Filter
Off



SET Filter On increases the on-set LET for Global  $\sigma_{SEU}s$  .

#### Global versus Burst $\sigma_{SEU}$ s





Off. However, Bursts slightly decrease at Low LET values with the SET Filter On.



## Test Structures and Heavy Ion SEU Results: With DCM

#### Scrubber always turned on Lowest LET Tested = 5.7 MeV-cm<sup>2</sup>/mg

| Test Structure | Frequency<br>Range | Additional Fault Tolerance |
|----------------|--------------------|----------------------------|
| Counters       | 50 MHz             | No                         |
| Global routes  | 50 MHz             | No                         |

Additional data on counters will be provided in the test report and future papers

### Functional Logic Radiation Test Structures: Digital Clock Manager (DCM)



We are testing DCM susceptibility by connecting the block to a design with a state space with feedback.

### σ<sub>SEU</sub>s for DCM Utilization versus No DCM Utilization with SET Filter On



#### Mitigated DCM (MITDCM) versus DCM





#### DCM SET Filter On



#### **Counter SEU Testing: Conclusions**



- Counter-array test structures were used to analyze: DFF,
   SET Filter, global route, and DCM SEU susceptibility.
- SEU data illustrate the following:
  - Utilization of SET Filters provide approximately a decade of improvement of DFF SEU susceptibility when not using DCMs.
  - Utilization of SET filters increase LET on-SET for global routes; however do not do much for bursts.
  - Usage of DCMs significantly increase SEU susceptibility and may make SET filter utilization impractical.
  - DCM mitigation strategy did not help  $\sigma_{\text{SEU}}$ s and proved to be an unworthy choice.
- Differentiating  $\sigma_{SEU}$ s is used to investigate SEU dominance and can be applied to determining component usage.
- Additional data and how they correlate to the  $\sigma_{\text{SEU}}\text{s}$  illustrated in this presentation will be provided in the final test report.



# Test Structures and Heavy Ion SEU Scrubber always turned on Lowest LET Tested = 5.7 MeV-cm<sup>2</sup>/mg

| Test Structure | Frequency Range  | Additional Fault Tolerance |
|----------------|------------------|----------------------------|
| DSP48E         | 10 MHz – 150 MHz | No                         |
| Global routes  | 10 MHz – 150 MHz | No                         |

Additional data on counters will be provided in the test report and future papers.

### Virtex 5 Family Digital Signal Processing Blocks (DSPs): DSP48E



ernal to the DSP48E column. They are not accessible via fabric routing resources.

Virtex-5 FPGA XtremeDSP Design Considerations www.xilinx.com UG193 (v3.5) January 26, 2012

There are a total of 320 DSP48E blocks in the Virtex-5QV (XC5VFX130T).

### Test Structures: Strings of DSP48Es with TMR'd BIST

NASA

A = Constant

**BIST: Built in Self Test** 

TMR: Triple Modular Redundancy

B = Registered (delayed) input

C = input from last stage for accumulation



### Test Structures: String of DSP Logistics



- DSP48E's are programmed to perform: AB + C.
  - A string of DSP48Es accumulate each of the products to form a polynomial:

N = number of stages in the String of DSPs;  $Y(n) = the_n'th output$ 

Y(n) = A<sub>0</sub>B(n) + A<sub>1</sub>B(n-1)+.... A<sub>N</sub>(n-N)= 
$$\sum_{i=0}^{\infty} A_i B(n-i)$$

- String of DSPs are widely used in Finite Impulse Response (FIR) filters and image processing.
- Although prior slides suggested not to use BIST, when dealing with complex circuitry, BIST is advantageous.
- Note that the BIST compares are triplicated.
   Voting is done in the tester. Minimal circuitry.

#### **DSP48E** $\sigma_{SEU}$ Data





#### **DSP48E SEU Testing: Conclusions**



- SEU data illustrate the following:
  - Frequency does not seem to affect  $\sigma_{SEU}$ s.
  - Configuration upsets have little affect on DSP48Es.
  - $\sigma_{SEU}$ s are fairly low for the amount of processing power.
- Additional data and how they correlate to the σ<sub>SEU</sub>s illustrated in this presentation will be provided in the final test report.



# Test Structures and Heavy Ion SEU Scrubber always turned on Lowest LET Tested = 5.7 MeV-cm<sup>2</sup>/mg

| Test Structure | Frequency<br>Range | Additional Fault Tolerance |
|----------------|--------------------|----------------------------|
| MicroBlaze™    | 50 MHz             | No                         |
| Global routes  | 50 MHz             | No                         |
| Caching        | 50 MHz             | Yes                        |

Additional data on counters will be provided in the test report and future papers.

#### **Processor and SRAM Communication**

NASA

SRAM: Static random access memory BRAM: Block random access memory

Processors talk to memory.

 Most processor radiation tests detect errors by erroneous SRAM memory writes.

 Visibility is significantly limited.



**Data Write** 

 We increase visibility by replacing external SRAM with the REAG low-cost digital Tester (LCDT).

### More on Increasing Visibility with Microprocessor Testing (1)



- As previously stated, the embedded SRAM in the tester (BRAM) takes the place of normal memory accesses.
- In addition, each memory access is timestamped and logged in alternate bank of BRAM.
   Only the last 512 accesses are kept.
- After each test run, the time stamped logs are output to the user.



### More on Increasing Visibility with Microprocessor Testing (2)





#### MicroBlaze<sup>TM</sup> SEU Testing: Conclusions



- Visibility was increased by isolating memory accesses as follows:
  - Moving the instruction and data storage to the LCDT for traffic observation.
  - Performing tests with and without cache to determine the influence cache has on upsets.
- Differentiating global upsets from the normal data set:
  - Helped to understand which upsets are prominent.
  - Gave insight to how the use of cache will affect  $\sigma_{SEU}$ s.
- Monitoring internal Micro-blaze<sup>™</sup> signals:
  - $\sigma_{SEU}$ s are not reliant on detecting erroneous memory read and writes anymore. Data are too limited and uninformative with sole reliance on memory reads and writes.
  - Can now determine when a processor crashes and how.

### Comparing Micro-blazeTM $\sigma_{SEU}$ s and Global Clock $\sigma_{SEU}$ s



SEU Cross Sections:
Cache vs. No Cache with Global Routes



#### **Summary**

- NASA
- We presented a framework for evaluating complex digital systems targeted for harsh radiation environments, such as space.
- If performing accelerated SEU testing on an application specific design:
  - Understand limitations in testing resultant data;
  - Be prepared for complex data de-convolution;
  - Pay attention to global structures;
  - Use basic-test structures to obtain an underlying understanding of DUT SEU behavior; and
  - Maximize visibility especially when testing application-specific designs.